Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems
نویسندگان
چکیده
The aim of optimising information retrieval (IR) systems using a risksensitive evaluation methodology is to minimise the risk of performing any particular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. However, the comparative risk-sensitive evaluation of a set of diverse IR systems – as attempted by the TREC 2013 Web track – is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. Instead of using a particular system’s effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic system effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems.
منابع مشابه
Tackling uncertainty in safety risk analysis in process systems: The case of gas pressure reduction stations
Industrial plants are subjected to very dangerous events. Therefore, it is very essential to carry out an efficient risk and safety analysis. In classical applications, risk analysis treats event probabilities as certain data, while there is much penurious knowledge and uncertainty in generic failure data that will lead to biased and inconsistent alternative estimates. Then, in order to achieve...
متن کاملPerformance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature
Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...
متن کاملReview of ranked-based and unranked-based metrics for determining the effectiveness of search engines
Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...
متن کاملNEW CRITERIA FOR RULE SELECTION IN FUZZY LEARNING CLASSIFIER SYSTEMS
Designing an effective criterion for selecting the best rule is a major problem in theprocess of implementing Fuzzy Learning Classifier (FLC) systems. Conventionally confidenceand support or combined measures of these are used as criteria for fuzzy rule evaluation. In thispaper new entities namely precision and recall from the field of Information Retrieval (IR)systems is adapted as alternative...
متن کاملUsing Predicate-Argument Structures for Context-Dependent Opinion Retrieval
Current opinion retrieval techniques do not provide context-dependent relevant results. They use frequency of opinion words in documents or at proximity to query words, such that opinionated documents containing the words are retrieved regardless of their contextual or semantic relevance to the query topic. Thus, opinion retrieved for the qualitative analysis of products, performance measuremen...
متن کامل